-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Group field caps shard requests per node #77047
Group field caps shard requests per node #77047
Conversation
a10af92
to
26427e9
Compare
This approach makes compromises to keep the implementation simple. I'm looking for feedback on these decisions:
Also -- I opened this PR against 7.x since the BWC logic can affect the design. If merged I'll forward-port to master. |
* Loads the mappings for an index and computes all {@link IndexFieldCapabilities}. This | ||
* helper class performs the core shard operation for the field capabilities action. | ||
*/ | ||
class FieldCapabilitiesFetcher { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This class lets us use the same per-shard logic for both the new and old execution strategies. It's not completely necessary to add it -- I could have shuffled some inner classes around to let us share this logic. However I found this to be a nice abstraction. It helps breaks up TransportFieldCapabilitiesAction
, which is complex, and opens the door to adding unit tests for field caps (which I hope to in a follow-up).
List<ShardId> shardIds = entry.getValue(); | ||
|
||
DiscoveryNode node = clusterState.getNodes().get(nodeId); | ||
assert node != null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can node
be null
? I didn't think so, but TransportFieldCapabilitiesIndexAction
has some logic to handle this case:
Line 244 in 6a88d84
if (node == null) { |
In general the logic around failures is pretty tricky. I'd like to write more tests for failure cases once I know the overall approach is okay.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it's a valid assertion. If the node is null, we should fail the request the same way since the node may vanished at any time. In the current model, that's ok because we'd try another replica but the new model requires to try these shards on potentially more than node. So unless we consider that we don't want to retry on replica, we'll need to handle failures differently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We had a conversation offline where we decided the current strategy (where we don't retry failures) was not acceptable. For example if a shard is being relocated, the coordinator could have an out-of-date view of where it is, and the field caps request would fail. It's challenging for clients to handle these partial failures.
Maybe in an immediate follow-up, we can add support for a retry step, where we do another round of node requests to retry on other shards/ replicas. Like the other one around reducing responses, this follow-up would need to ship in the same release. For now in this PR I'd just replace the incorrect assertion with an error.
3b8023a
to
8b6b977
Compare
Pinging @elastic/es-search (Team:Search) |
Thanks @jtibshirani.
I am not sure if it's a good approach because it will perform heavy works such as RBAC auth and iterating fields multiple times. As we retrieve field caps from any matching shard of an index, I think we can pass a list of indices instead in a node-level request. The receiving node then exhaustedly tries to all shards (one by one) of requesting indices until it finds matching copies. The coordinating node will retry on another node for indices without matching shards. WDYT? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logic makes sense to me. I left some comments regarding the merging of responses and the bwc.
class FieldCapabilitiesNodeResponse extends ActionResponse implements Writeable { | ||
private final String[] indices; | ||
private final List<FieldCapabilitiesFailure> failures; | ||
private final List<FieldCapabilitiesIndexResponse> indexResponses; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could do a pre-merge locally to reduce the size of the response ? It seems wasteful to send back the entire list.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, but I think this is nice to do as a follow-up to keep this PR simpler. Since it will affect the wire format, we should make the follow-up in this same release. I'll figure out how to do that (maybe we could have a short-lived feature branch).
|
||
// If all nodes are on version 7.16 or higher, then we group the shard requests and send a single request per node. | ||
// Otherwise, for backwards compatibility we follow the old strategy of sending a separate request per shard. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have to take the remote cluster into account. Would it be simpler to make the decision on a per-connection level ? I think it's "ok" to emulate the old model by translating the node request into multiple shard request on older connections.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm having trouble understanding this part. Could you explain why we need to take the remote cluster into account?
@Override | ||
public void messageReceived(final FieldCapabilitiesNodeRequest request, | ||
final TransportChannel channel, | ||
Task task) throws Exception { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we fork the execution outside of the network thread ? The list of shards can be large on some tiers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good question! What do you think would be the right thread pool (maybe 'management')?
List<ShardId> shardIds = entry.getValue(); | ||
|
||
DiscoveryNode node = clusterState.getNodes().get(nodeId); | ||
assert node != null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it's a valid assertion. If the node is null, we should fail the request the same way since the node may vanished at any time. In the current model, that's ok because we'd try another replica but the new model requires to try these shards on potentially more than node. So unless we consider that we don't want to retry on replica, we'll need to handle failures differently.
I discussed with @dnhatn and @jimczi and decided to merge the PR into a feature branch
We also discussed @dnhatn's concern about the strategy for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Julie! Let's merge this work to the feature branch.
This adds a retry mechanism for node level field caps requests introduced in #77047.
Currently to gather field caps, the coordinator sends a separate transport request per index. When the original request targets many indices, the overhead of all these sub-requests can add up and hurt performance. This PR switches the execution strategy to reduce the number of transport requests: it groups together the index requests that target the same node, then sends only one request to each node.
This adds a retry mechanism for node level field caps requests introduced in elastic#77047.
Currently to gather field caps, the coordinator sends a separate transport request per index. When the original request targets many indices, the overhead of all these sub-requests can add up and hurt performance. This PR switches the execution strategy to reduce the number of transport requests: it groups together the index requests that target the same node, then sends only one request to each node. Relates #77047 Relates # #78647 Co-authored-by: Julie Tibshirani <[email protected]>
Currently to gather field caps, the coordinator sends a separate transport request per index. When the original request targets many indices, the overhead of all these sub-requests can add up and hurt performance. This PR switches the execution strategy to reduce the number of transport requests: it groups together the index requests that target the same node, then sends only one request to each node. Relates #77047 Relates # #78647 Co-authored-by: Julie Tibshirani <[email protected]>
Currently to gather field caps, the coordinator sends a separate transport
request per index. When the original request targets many indices, the overhead
of all these sub-requests can add up and hurt performance. This PR switches the
execution strategy to reduce the number of transport requests: it groups
together the index requests that target the same node, then sends only one
request to each node.
Addresses #74648.